175 research outputs found

    WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks

    Full text link
    Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the 99 largest language editions. The dataset contains yearly snapshots of the network and spans 1717 years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version accepted at the 13TH International AAAI Conference on Web and Social Media (ICWSM 2019) - Munich (Germany), 11-14 June 201

    The "potential" face of absorptive capacity. An empirical investigation for an area of 3 European countries

    Get PDF
    This paper draws on the multi-dimensional characterization of absorptive capacity (AC) to empirically investigate the antecedents and the effects of its "potential" dimension (PAC): i.e., the firm's capacity of acquiring and assimilating external knowledge, as distinguished from its "realized" transformation and exploitation (RAC). Based on a sample of about 10,500 firms for an area of 3 EU countries (Italy, Germany and Spain) we find that the firm's reliance on external knowledge in general increases its PAC, and that this effect is magnified by the internal shocks the firm faces. However, both these effects find relevant exceptions when different kinds of external sources are considered, at different kinds of distance from the absorbing firm. Unexpectedly, social integration mechanisms in the firm makes PAC less, rather than more, inductive of innovation outcomes. On the contrary, the human capital of the firm has a positive moderating role on the PAC effects. A possible trade-off in the exploitation of the externally assimilated knowledge is suggested.absorptive capacity; external knowledge; innovation

    Epidemic-style proactive aggregation in large overlay networks

    Get PDF

    Gossip-based self-managing services for large scale dynamic networks

    Get PDF
    Modern IP networks are dynamic, large-scale and heterogeneous. This implies that they are more unpredictable and difficult to maintain and build upon. Implementation and management of decentralized applications that exploit these networks can be enabled only through a set of special middleware services that shield the application from the scale, dynamism and heterogeneity of the environment. Among others, these services have to provide communication services (routing, multicasting, etc.) and global information like network size, load distribution, etc. The goal is not to provide abstractions that hide the distributedness of the system, but rather, to hide the unpleasant features of the system, such as dynamism, scale and heterogeneity. Most importantly, these services have to be self-managing: they have to be able to maintain certain properties in the face of extreme dynamism of the network. In this manner, such services can serve as the lowest layer that makes possible building more complex applications, or simply as a plugin to enhance existing systems, for example, GRID environments. Apart from self-management, we require that the services be simple and lightweight, to allow easy implementation and incur low cost. Our approach to achieving these goals is based on the gossip communication model. Gossip protocols are simple, robust and scalable, besides, they can be applied to implement not only information dissemination, but several other functions, as we will show. So far, we have designed gossip-based protocols for maintaining random overlays, which define group membership. Based on this random overlay, we have designed gossip-based protocols to calculate aggregate values such as maxima, average, sum, variance, etc. We have also developed protocols to build several structured overlays in this framework, including superpeer, torus, ring, binary tree, etc. These protocols build on the random overlay and also on aggregate values. The gossip-based model is well suited to dynamic and large networks. Our protocols are extremely simple to implement while being robust and adaptive without adding any extra components or control loops. Our approach also support composition at a local level. At each node in the network, the same services are available: for example, data aggregation uses the random overlay (peer sampling service) and superpeer topology construction applies aggregate values, such as maximal and average capacity. In fact, protocols that implement the different services are heavily interconnected and form a modular system within this lighweight self-managing service layer. While this presentation focuses on the self-managing systems services, it is clear that other application-level services can also be built at higher layers. These services can be proactive, like load balancing, that can make use of the target (average) load and overlays for optimization of load transfer, or reactive, like broadcasting or search, that can be performed on top of an appropriate overlay network (eg spanning tree or superpeer network), maintained by the lighweight self-managing systems services

    Which way? Direction-Aware Attributed Graph Embedding

    Full text link
    Graph embedding algorithms are used to efficiently represent (encode) a graph in a low-dimensional continuous vector space that preserves the most important properties of the graph. One aspect that is often overlooked is whether the graph is directed or not. Most studies ignore the directionality, so as to learn high-quality representations optimized for node classification. On the other hand, studies that capture directionality are usually effective on link prediction but do not perform well on other tasks. This preliminary study presents a novel text-enriched, direction-aware algorithm called DIAGRAM , based on a carefully designed multi-objective model to learn embeddings that preserve the direction of edges, textual features and graph context of nodes. As a result, our algorithm does not have to trade one property for another and jointly learns high-quality representations for multiple network analysis tasks. We empirically show that DIAGRAM significantly outperforms six state-of-the-art baselines, both direction-aware and oblivious ones,on link prediction and network reconstruction experiments using two popular datasets. It also achieves a comparable performance on node classification experiments against these baselines using the same datasets

    Challenge-based learning as a tool for creativity and talent expression

    Get PDF
    After the stop caused by the pandemic, the University of Trento and its newly born FabLab reopened the doors to DigiEduHack (https://digieduhack.com/en/), the decentralized hackathon dedicated to the most pressing challenges of digital and innovative education. More than 30 multidisciplinary students have ventured into the design of innovative learning tools to meet the challenge thrown at them: prototyping educational board games; multimedia artefacts and installations at the intersection of big data, art and technology; co-designing festivals in a combination of art, science and fun; laboratory images to be presented in the classroom. In this short paper, as a case study one, we will outline the DigiEduHack initiative, focusing on the potential of a challenge-based approach in stimulating and strengthening introspection, creative thinking and talent’s expression. Supported by a set of qualitative data collected before and after the event, this work reports an education case study and shows the progress and preliminary reflections of the students and educators involved

    Towards Data-driven Software-defined Infrastructures

    Get PDF
    Abstract The abundance of computing technologies and devices imply that we will live in a data-driven society in the next years. But this data-driven society requires radically new technologies in the data center to deal with data manipulation, transformation, access control, sharing and placement, among others. We advocate in this paper for a new generation of Software Defined Data Management Infrastructures covering the entire life- cycle of data. On the one hand, this will require new extensible programming abstractions and services for data-management in the data center. On the other hand, this also implies opening up the control plane to data owners outside the data center to manage the data life cycle. We present in this article the open challenges existing in data-driven software defined infrastructures and a use case based on Software Defined Protection of data
    • …
    corecore